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Nigeria official languages are English, Yorùbá, Igbo and Hausa. The focus of 
the study reported in this paper is to develop learning tool that can assist 
learners to learn the Yorùbá language using its alphabets. The study is critical 
to Yorùbá language, because of its endangerment. There is need to introduce 
different learning tools that can mitigate its extinction. A Yorùbá word perfect 


system was developed to assist people in learning the Yorùbá language. 
English and Yorùbá words formation are experimented using computational 
Keyword: morphological approach (word formation). The theoretical framework 
considered Finite state automata (FSA) to realise different ways of combining 
the consonants and vowels to form word. Two to five letter words were 
considered. The system was designed and implemented using UML tools and 
python programming language.The system will teach the users on how the 
words are formed, and the number of syllables in each word. The user need 
not to know how to tone mark word before he/she can use the system. Any 
word typed will be analysed according to its number of syllables. This 
approach produces representatives of all parts of speech (POS) of the two 
languages. It produces corpora for the two languages. 
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1. INTRODUCTION 

Firstly, can other African languages be learnt using this approach? In a way can Igbo and Hausa 
(the other two Nigerian languages) be learnt using this approach? Yorùbá, Igbo and Hausa are tonal languages 
and possibly some other African languages. Secondly, can other world languages be learnt using this approach? 
Morphological analysis is the first step in many natural language processing tasks. Tasks such as parsing, 
machine translation, information retrieval and part of speech tagging, among others [1]. 

Morphology is the study of the internal structure of the word. Morphological analysis can be used to 
retrieve the grammatical features and properties of a morphologically inflected word [2]. It is the process of 
segmenting words into morphemes and analysing the word formation. It is a primary step for various types of 
text analysis of any language [2]. 

As noted in [3], morphology studies the internal structure of words. The building blocks are called 
morphemes. One distinguishes between free and bound morphemes. Free morphemes are those which can stand 
alone as word. Bound morphemes are those that always have to attach to other morphemes (Söhn, 2008). 
Morph or Morpheme is a minimal distinctive unit of grammar. E.g., a word like unselfish (àinfifétaraeninikan) 
has three morphemes in the English language. They are: un-, self and -ish. In Yorübá, it has nine morphemes: 


at+itnitife + tit ara + eni + ni + kan which can be segmented [4]. 
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An allomorph is a possible way of realizing a morpheme. For example, the plural morpheme in English 
is realised by the allomorphs —es, -ies, -s. E.g., box (boxes), fly (flies) and book (books) [3]. The morpheme 
which expresses plurality in English, for instance, appears in several variants: cap/caps, log/logs, force/forces, 
mouse/mice, sheep/sheep, etc. Two of these variants form the voiceless [s] of caps, the voiced [z] of logs. 
The irregular shape of mice could be said to be an allomorph of the plural morpheme and the phenomenon is 
called allomorphs. The plural morphemes in Yorùbá are not expressed in this manner. For example, pupil and 
pupils; pupil means ọmọ ilé-éko and pupils means àwọn omo ilé-eko. 

Word formation in the two languages follows a simple process. In English, vowel can attach with a 
consonant to form a word, for example, in, on, of (mostly prepositions), etc. In Yorùbá, consonant can attach 
with a vowel to form a word or words (according to the variation of the tone marks usually verb), this word is 
usually a root word of a whole class of words, for example, dé cover, gé cut, kà read etc. Pre-fixing a vowel to 
these words, a noun is formed from the same root word, for example, dé —> cover, a + dé —> crown, lù —> beat, 
ì-lù — drum [4]. 

In [5], extensive work on Yorübá word syllabication had been done. The software developed can be 
used for tone marking and under-dotting of Yorübá words. Our concern in this paper is to develop a tool that 
can simplify the word formation and how the syllable can be identified in a word. The tool can be used for 
other related languages within Africa and other parts of the world. 

The remaining part of the paper is organized as follows: section 2 examines related works; section 3 
gives the theoretical framework; section 3 section 4 discusses the results, while section 5 concludes the paper. 


2. RELATED WORKS 

In [5], is of opinion that if urgent steps are not taken by the stakeholders that Yoruba Language will 
be endangered. He examined low-usage of the language in some states in South West of Nigeria. The findings 
show that the students find it difficult to communicate with the people using the Yorübá language. 

In [6], examined the factors or variables that are responsible for Yorübá language endangerment. 
The results of the study show lack of commitment to indigenous language; habitat displacement, colonial 
legacy, and defective language planning are responsible to the gradual extinction. The study concluded that the 
Yorübá language should be used at home, and it should be a criteria for post primary school admission. 

In [7], laid down a basic way of learning the Yorübá language. In the book the author explain different 
types of Yorübá vowels and their features. The author discussed phonology and morphology of Yorübá 
language. The author explained how words are formed. The book is informative and it will help learners. 

In [8], examined the contributions of mother tongue in pre-Nursery or primary early childhood 
education. The study raised six issues: language background, situation in Nigeria, policy documentation on 
language in Nigeria, language theory and development, and problem statement and rationale. Success and 
failures of mother tongue usage in the country needs to be reviewed in order to address issues mentioned. 

In [9], was of opinion that proper implementation of educational policies in Nigeria will increase the 
learning of indigenous languages in particular Yorübá language. 

In [10], presented that one of Nigerian educational policies stipulated it that “the first three years of 
primary education should be taught in learners’ mother tongue. The study conducted show that primary school 
teachers used were able to teach the subjects (like social studies), but with challenges. 

In [11], examined which of the Yorübá language (mother tongue learners) as a medium of instructions 
and English language as a medium of instructions to teach social studies in the nursery school would make the 
pupils perform excellently well. The experiment was carried out and the results show that the pupils taught 
with their mother tongue (Yorübá language) perform better than those that were taught with English language. 

In [12], learning is defined as a change in knowledge attributable to experience. Learning involves a 
change in the learner, learner's knowledge and cause of the change is the learner's experience. Learning is not 
measured through one operational definition. Rather, learning is a blend of comprehension, transfer of new 
material, and the retention of material. In fact, most transfer studies focus purely on the similarities and 
differences between the contexts of initial learning and subsequent transfer [13]. Given the current study, 
learning has been evaluated using a multimedia device. 

In [13], proposed the use of E-learning approach to teach and learn the Yorübá language. He opined 
that the Information and communication technology (ICT) is a good tool to increase the people's interest in 
learning the language. 

In [14], developed a windows mobile application for learning Yorübá language. The learner can learn 
how to read alphabets, numbers and common words in Yorübá language. 
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3. THEORETICAL FRAMEWORK 

Finite state automata (FSA) technique was used to analysed different ways of forming words from the 
two languages’ alphabets. The FSA was used for two to five letter words, while some notable examples of 
single letter words were mentioned. In subsequent subsections, they will be discussed in details. 


3.1. Single letter words 

In English language, ‘/’ (pronoun) and ‘a’ (determiner) are the single letter words.. In Yorùbá 
language it is mostly pronouns. Example: 

‘Ó’ (3" person singular, s/he, and it) and ‘A’ (we). 


3.2. Two letter words 

Two letter words (KF) in Yorübá language are verbs and pronouns, and single syllable. In English 
language the two letter words can be CV (he, me, we, to, etc), and VC (an, on, in, it, of, etc). Figure 1 shows 
how two letters words are formed using the finite state automata. Table 1 shows all possible Yorùbá words that 
can be generated from Figure 1. The possible combinations of KF or CV are not sensible semantically in some 
cases, but syntactically they are sensible. The focus here is to address issue of syllable that will give meanings 
to all these possible combinations. Also in Figure 1, three different Yorübá language tone marks are shown; 
high tone (H: 4), low tone (L: à) and mid-tone has no symbolic representation (a). The application of tone 
marks on vowels are used to resolve possible ambiguities in some words. If the tone marks cannot be used to 
distinguish the words, then such words can be distinguished contextually. A sensible two letter words can only 
form three different words maximally (bá, bà and ba), each word may have different meanings. For example, 
‘ba’ is an ambiguous word. It means touch down or ferments. 

Pure syllabic vowels (m and n) in Yorübá language can be tone marked within a sentence or a phrase. 
In most cases they take high tone, and they are single letter words. For example: m bo and ń lọ. Also, consonant 
Ss is the only one that has under dot. In most case it is used to stress word (e.g., telifisan - television). 
The alphabet GB gb is a combination of two letters (also termed as digraph) is consonant not two consonants 
(gbé, gba, gbo, etc). 

Figure 1 explains possible ways of forming two letter words from both languages. The Yorübá has 
one way of forming two letter words. That is, Konsonanti (K) + Faweli (F) => KF. The English language has 
two forms. That is, consonants (C) + vowels (V) => CV and VC. Examples are shown in Table 2. 


Table 1. Yorübá language two letter words 


K+F aéeidou aéeioou aeeioou (no tone marks) 
Bb 

B+a bá Bà ba 

B*e Bé Bà Be 

B*e Bé Bé Be 

Bti Bi Bi Bi 

B+t+o Bó Bó Bo 

B+ọ Bó Bò Bọ 

B+u Bú Bù Bu 


Table 2. English Language two letter words 


CV POS VC POS 

He PRN if Conjunction 

to PRN on Preposition 

be Verb an Determiner 

me PRN it PRN 

we PRN in Preposition 

go Verb of Preposition 
Figure 1. FSA state diagram for English and do Verb us PRN 


Yoruba two letter words 


3.3. Three letter words 

In Figure 2, the state diagram of three letter words for English and Yorübá languages is shown. For 
the Yorùbá language there are possible three combinations. 

The first scenario (FKF) is shown in Table 3 (possible combinations of F, K, and F). It means that 
seven vowels can be combined with consonants prefix and postfix. The tone marks and under-dots produce 
different words. This is A + b + other vowels. 
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Figure 2. FSA State Diagram of English and Yorùbá Three Letter Words 


Table 3. Scenario 1: F + KF 
F+ KF Yorùbá word Syllable POS English word POS 


A+ba Aba 2 verb to incubate Verb 
A+ba Aba 2 noun Hamlet noun 
A+ba Aba 2 noun Ladder noun 
A+ be Abe 2 noun Village name noun 
A t be Abe 2 noun blade noun 
A t bi Àbí 2 Isn't it 

A t bo Abo 2 noun Female noun 
A t bo Àbo 2 noun noun 


In scenario 2 (KFF), KFF produces one word as shown in Table 4. Scenario 3 (KFK) as show 
in Table 5, gives the combination of KFK i.e. K + FK. The FK can represent the five nasal vowels mentioned 
in the previous section. 


Table 4. Scenario 2: KF+F 
KF+F Yorübá word Syllable POS English word POS 
na-ca Náà 2 AIOO The Det 


Table 5. Scenario III: K+FK 


KFK Yorübá word Syllable POS English word POS 
K+an kan 2 V brake V 
K+an kan 2 V knock V 
K+an kan 2 V sower V 
Y+en yen 2 article that article 
G+un gun 2 V Stab V 

Y «an yán 2 V Yarn V 


Table 6 shows different combination of consonants and vowels forming three letter words in English language. 
The combinations are: CCC, CVC, CCV, VVC, CVV and VCV. There are different words formed. 


Table 6. Three letter words formation in English language 


Combinations English word POS 

CCC Fly noun 

CVC Low Adjective 
VCC Egg noun 

CCV The Determiner 
VVC Oil noun 

CVV See verb 

VCV Use verb 


3.4. Four letter words 

Figure 3 depicts the English and Yoruba language four letter words formation. KFKF and FKFK are 
two possible combinations for Yorübá four letter words as shown in Table 7 and 8. These are sample words 
from possible words that can be generated. 
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Table 7. Scenario 1: KFKF 
KF+KF Yorùbá Syllable POS English POS 


word word 
da + ra dára 2 Adj Good Adj 
pa + de pádé 2 Verb to close Verb 
pa + de pàdé 2 Verb to meet Verb 
pa + da padà 2 Verb to return Verb 
Ba+ba Baba 2 Noun Father Noun 


Figure 3. FSA state diagram for English 
and Yorùbá languages four letter words 


There are eight (8) combinations of English four letter words as shown in Table 9. The combinations 
are: CVCV, CVVC, CVCC, VCVV, VCVC, CCVC, VCCV, and VVCC. 


Table 8. Scenario 2: F + KFK Table 9. English four letter words 

F+KFK Yorübá word Syllable POS English word POS Combinations English word POS 
E+ gbon Egbón 2 Noun brother noun CVCV Zone noun 
E + gbin Egbin 2 Noun animal noun Base noun 
E + dun Ẹdun 2 Noun animal noun Hake verb 
a + din àdín 2 Noun Palm kernel oil noun CVVC Beak noun 
i+ yen iyen 2 Det that det quin noun 
A + kan Akàn 2 Noun Crab noun joey noun 
e t hin ehin 2 Noun back noun CVCC tell verb 
a + han ahán 2 Noun tongue noun VCVV Aqua noun 
ọ+ kan okan 2 Noun heart noun VCVC Epic verb 
e+ dun edun 2 Noun animal noun CCVC Clap verb 
i+dun idun 2 Noun Bed bug noun VCCV Abba noun 

VVCC Oink noun 


3.5. Five letter words 

Five letter words pattern is shown in Figure 4 for the two languages. There are two possible ways of 
forming five letter words in Yoruba language. It can be F + KF + KF as depicted in table 10 and F + F + KFK 
as depicted in Table 11. 


Figure 4. Pattern analysis of five English and Yorùbá letter words 


Table 10. Scenario 1: F + KF + KF 


Combinations Yorübá word Syllable POS English Equivalent POS 
F+KF+KF 

] ^ sé 4 dá Iseda 3 verb create verb 
Ò + dò + dó Òdòdó 3 noun flower noun 
À +kó + kò Àkókò 3 noun time noun 


Table 11. Scenario 2: F+ F + KFK 


Combinations Yorübá word Syllable English Equivalent POS 
F+F+KFK 

O-có-rün Oórün 3 sun noun 
Ó - ó t rün Oórün 3 odour noun 


There are seven possible combinations for English language five letter word as shown in Table 12. 
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Table 12. English language five letter word 
Combinations English word POS 


CCVCV Stone noun 
CVCVC Widow noun 
CVVCC Round noun 
CCVVC Broom noun 
CCVCC Chest noun 
CVCCV Title noun 
VCVVC Ocean noun 


4. CONCLUSION 
4.1. System framework design 

The system frame work design covers the system database, and software design. Figure 5 depicted the 
system activity. In the theoretical frame work section, two to five letter words were analysed for the two 
languages. The feature of each letter word was discussed in relation to the expected number of tones on each 
word. These tones determine the number of syllable formed. Morphemes are different from words as it was 
explained in the previous sections. According to Figure 5, the system determines whether a word typed is a 
Yorübá or not. If itis a Yorübá word, the system will check for the number alphabets to know if it is two, three, 
four or five letter words. The system will compare the word with the words in the database. If it matches, the 
system displays the possible number words that such a word can be represented. The user needs not to tone 
mark a word while typing the word. The system will analyse the word to determine how many possible words 
can be formed from that single word. It will provide different possible words that can be formed. For example; 
if igba is typed, after the analysis, the system will display five different words which are igba (200), 
ìgbà (season), igbá (garden egg), igbà (rope for climbing palm tree), igbá (calabash). The tone marks determine 
the syllables; in this case there are two syllables i-gba, it means the tone marks can be on the two vowels 
(i or a). These tone marks determine how words are pronounced (phonology). The Yorübá orthography (writing 
styles) depends on this tone marks to make meaning out of a word. It might not be critical in speeches but 
critical in texts. It is problematic when reader is fixing meaning from the context. However, such a meaning 
may deviate from actual intended meaning. The system will displays the the total number syllables, the parts 
of speech (POS) and the English language equivalent. The system activities diagram showing the various 
actions been performed by the system is shown in Figure 5. 


Not Yorübá word N 


search Yorübá words 


>| Display Error message 


( two letter ) three letter \ four letter j five letter ) 
words words words words 


MP sere ere 


fe) ees (seers) 


"-—Á — — 


© 


Figure 5. The System Activities Diagram 


4.2. Database design 

The database design is based on the theoretical analysis done in the previous section. The structure of 
the database is different from the tables presented in the previous section. The database consists of words from 
two letter words to five letter words. The words were manually tone marked. Three things were considered in 
the database: the Yorübá word (tone marked), syllable, its equivalent in English language, and POS. Each letter 
word was separately designed for easy access. The system compare every word typed by the user with the word 
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in the database. The database was designed to accept new words, but must be vetted by the system 


administrator. 


4.3. Software design and implementation 
Figure 6 shows how different modules relate with each other. There are four modules: startpage, 


selectview, analyse, and selectsearch. The startpage coordinates other modules. The selectview displays the 
word’s attributes. It displays the tone marked word, syllable, and POS. The selectserch is the database that the 
startpage can access. SQLite was used to design the database. The code was implemented using python 
programming language. The system class diagram is shown in Figure 6. 


SelectView 
Canowwond 
-ShowS ylable 
+getDatabase() -showPOS 
*setDatabase(database) : void -showtonemark 
-attribute 
+getShowword() 
-words +setShowword(Showword) : void 
-Parts of speech = |=  [------> +getShowS yllable() 
-Syllable +setShowSyllable(ShowS yllable) : void 
A e attribute *getShowPOS() 
*getWords() *setShowPOS(showP OS) : void 
*setWords (words) : void +getShowtonemark() 
+getParts of speech() +setShowtonemark(showtonemark) : void 
+setParts of speech (Parts of speech) : void +getAttribute() 
*getSyllable() +setAttribute(attribute) : void 
*setSyllable(Syllable) : void 
*getAttribute() 


*setAttribute(attribute) : void 


Figure 6. The System Class Diagram 


5. RESULTS AND DISCUSSION 
The system implementation considered all the modules and units to implement the whole system. 


Figure 7 depicts the Yorübá word formation system. The system has user plane, where user can type a word. 
Below the user plane are: analyse, reset, and close buttons. The user can analyse the word by clicking the 
analyse button. The user can reset and type new word. Figures 8 show the system sample outputs. 

The GUI displays the results of word analysed, it displays the number of syllables, Yorübá language 


tone marked words, the equivalent words in English language, and the POS. 


Figure 7. The System Graphical User Interface (GUI) 


NECS MENT. 


fogun 


Result of analysis. 


word ste: O gun 


of stable: 2 
Tonemarked T <t " 
ca Ìgbà time noun 
two hundred adjective 
calabash noun 


2 ògún 
3 ogún heritage á locust tree noun 


| ogun war twisted rope noun 


Figure 8. Sample Output of the System 
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6. CONCLUSION 

There were many things considered in this study and there are many ways of using them. The alphabets 
combination can provide reasonable size of corpora for the English and Yorübá languages which can be used 
for machine translation. The FSA state diagram can be used for other language pair to see whether it is suitable. 
The final application can be used by Yorübá Teachers at any level of education. This is similar to (but detailed 
than) Nursery/Primary English word perfect. In future, we will make it a multimedia system. Text, pictures, 
sounds will be included. 
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